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Abstract 

Background: Malaria causes over one million deaths annually, posing an enormous health and economic burden 
in endemic regions. The completion of genome sequencing of the causative agents, a group of parasites in the 
genus Plasmodium, revealed potential drug and vaccine candidates. However, genomics-driven target discovery has 
been significantly hampered by our limited knowledge of the cellular networks associated with parasite 
development and pathogenesis. In this paper, we propose an approach based on aligning neighborhood PPI 
subnetworks across species to identify network components in the malaria parasite P. falciparum. 

Results: Instead of only relying on sequence similarities to detect functional orthologs, our approach measures the 
conservation between the neighborhood subnetworks in protein-protein interaction (PPI) networks in two species, 
P. falciparum and £ coll. 1,082 P. falciparum proteins were predicted as functional orthologs of known 
transcriptional regulators in the £ coli network, including general transcriptional regulators, parasite-specific 
transcriptional regulators in the ApiAP2 protein family, and other potential regulatory proteins. They are implicated 
in a variety of cellular processes involving chromatin remodeling, genome integrity, secretion, invasion, protein 
processing, and metabolism. 

Conclusions: In this proof-of-concept study, we demonstrate that a subnetwork alignment approach can reveal 
previously uncharacterized members of the subnetworks, which opens new opportunities to identify potential 
therapeutic targets and provide new insights into parasite biology, pathogenesis and virulence. This approach can 
be extended to other systems, especially those with poor genome annotation and a paucity of knowledge about 
cellular networks. 



Background 

Malaria is a major threat to public health and economic 
development in endemic regions. About 300-500 million 
cases are reported, and 1-2 million people die from 
malaria every year. Children and pregnant women are 
among the hardest hit of malaria victims. Five parasite 
species, P. falciparum, P. vivax, P. malariae, P. ovale, and 
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P. knowlesi, cause human malaria. P. falciparum is the 
most virulent and widespread one. 

The continuous morbidity or mortality of malaria is lar- 
gely due to the rapid development of parasite resistance to 
currently available drugs and the increasing insecticide 
resistance of mosquito vectors. It is imperative to search for 
new lines of antimalarial drug and vaccine targets. The 
complete genome sequencing of P. falciparum and its 
sibling species and strains [1-6], the subsequent transcrip- 
tomic [7-30], proteomic [31-46], metabolic [47-54], interac- 
tomic analyses [55-60] and, most recently, next-generation 
sequencing [61-63] efforts have set the stage for a quantum 
leap in our understanding of the fundamental processes of 
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the parasite life cycle and mechanisms of drug resistance, 
immune evasion, and pathogenesis. However, the paradigm 
of -omics driven target discovery has been significantly 
hampered by our limited knowledge of the cellular net- 
works associated with parasite survival, development, trans- 
mission, invasion, and pathogenesis. 

We propose to circumvent this limitation using a sub- 
network alignment approach. It has been shown that net- 
work alignment offers an effective means to elucidate 
network structure and predict protein orthologs [64-69]. 
Our approach extends the concept of network alignment 
to align subnetworks of proteins for measuring their 
functional relations in a network context. It is particularly 
useful when the genome of interest suffers from poor 
annotation due to low or no sequence similarity to 
known proteins, a significant problem for P. falciparum, 
as over 60% of the predicted open reading frames (ORFs) 
were annotated as "hypothetical" without functional 
assignment [5]. Previously, we developed a supervised 
learning algorithm for remote homology detection based 
on support vector machines (SVMs) and profile kernels 
[70], and predicted a group of novel proteases [71], 
which were implicated in networks associated with sig- 
naling, stress response, cell cycle progression, metabo- 
lism, and invasion [72]. In this study, we attempt to 
identify network components beyond sequence-similarity 
searches. 

PPI network alignment algorithms are designed to 
match nodes in two PPI networks such that the conserved 
interactions between the orthologs in the networks are 
captured or maximized in counts. The current network 
alignment algorithms are either local or global approaches. 
Local network alignment [64-69] aims at detecting pairs of 
subnetwork modules with many functional orthologs. 
Typically, these algorithms start from conserved regions 
and expand the regions greedily in the two PPI networks. 
Global network alignment attempts to find the best con- 
sistent mapping of the proteins in the two PPI networks 
for maximizing the number of conserved interactions. Pre- 
vious studies tackled the global network problem with 
Markov Random Field (MRF) [73], combinatorial graph 
matching by optimization [74-76], and random walk on a 
Kronecker product graph of two PPI networks [77]. 

Since P. falciparum shares very few orthologous pro- 
teins with other species, the conserved interactions 
between P. falciparum PPI network and the PPI network 
of model organisms are too few to reveal meaningful 
alignments. Thus, network alignment is not directly 
applicable to the study of P. falciparum PPI network. 
Instead of focusing on detecting alignment, we propose 
to measure the functional relation between P. falciparum 
proteins and the annotated proteins in another species by 
aligning the neighborhood subnetworks of the two pro- 
teins. The neighborhood subnetwork of a protein (called 



the central protein) contains the nearby neighbors reach- 
able by the protein through a small number of hops in 
the PPI network. Our assumption is that the neighbor- 
hood subnetwork captures information on the functional 
role of the central protein. Based on this assumption, if 
two proteins are functional orthologs, their neighborhood 
subnetworks will share similar paths or other structural 
patterns. Our subgraph alignment approach is designed 
to summarize the structural similarity between neighbor- 
hood subnetworks for ortholog prediction. 

As a proof-of-concept, we chose to predict the com- 
ponents in the transcriptional regulation network in 
P. falciparum. It was chosen because: (1) parasite employs 
exquisite regulatory machinery on gene expression to 
assure timely and accurate coordination on parasite 
growth, development, infection, and virulence. (2) Very lit- 
tle is known about the components, dynamics, and design 
principles of this network. New discoveries of network 
components could significantly fill our knowledge gaps 
and possibly lead to new short lists of proteins that are 
poorly understood and poorly annotated for functional 
characterization. The correspondent network used was 
from Escherichia coli. Detection of network similarities 
among Eukaryotes and among Prokaryotes have been 
demonstrated [73,78], but detection of similarities between 
these two groups is a more challenging problem. The abil- 
ity to make comparisons across such a wide phylogenetic 
gap means, firstly, that evolutionarily conserved (and 
therefore significant) subnetworks can be detected and, 
secondly, that it is possible to search beyond more closely 
related strains. This is especially significant in cases like 
P. falciparum, where the immediate relatives reveal com- 
paratively little about its functional subsystems. 

Results and discussion 

Module-based subnetwork alignments predicted 1,082 
components in transcriptional regulation network in 
P. falciparum 

It is a common belief that the malaria parasite possesses a 
complex and orchestrated transcriptional regulatory sys- 
tem [79,80]. However, only a small number of transcrip- 
tional regulators have been identified, including a 
conserved set of basic transcription factors [81] and those 
predicted based on parasite developmental microarray 
expression profiles and motif analysis [82-84]. A recent 
study by Bischoff and Vaquero [85] combining literature 
searches, motif finding, and transcriptomic, proteomic, 
and interactomic analyses expanded this list to include 
proteins related to chromatin functions and remodeling. 

Our functional module-based subnetwork alignments 
predicted that 1,082 P. falciparum proteins were func- 
tional orthologs of known transcriptional regulators in 
the E. coli network (Additional file 1). 37% of these pre- 
dicted functional orthologs appeared to be "putative 



Cai et al. BMC Systems Biology 2012, 6(Suppl 3):S5 
http://www.biomedcentral.eom/1752-0509/6/S3/S5 



Page 3 of 1 3 



uncharacterized proteins" or "conserved Plasmodium 
proteins" of unknown function. This is in agreement with 
the fact that 10 years after the completion of genome 
sequencing, the proportion of ORFs with no functional 
assignment has only been reduced to 45% [86]. Func- 
tional enrichment analysis [87] revealed that 31 Gene 
Ontology (GO) terms were over-represented (p < 0.05), 
including those processes that are well known to be asso- 
ciated with transcriptional regulation such as proteolysis 
[72], response to stimulus, and proteasome activity 
(Figure 1). 

General transcriptional regulators 

The predicted functional orthologs include several general 
transcriptional regulators (Table 1) that are commonly 
present in a wide variety of species. The first is basic tran- 
scription factor 3b (Accession number PF14_0241). It was 
found via yeast 2-hybrid (Y2H) analysis [57] to have a 
direct PPI with a nascent polypeptide associated complex 
a chain protein (PFF1050w), the erythrocyte binding anti- 
gen-181(EBA-181, PFA0125c), and a putative coronin 



binding protein (PFFlllOc), suggesting that it may be 
involved in protein folding, immune evasion, and cellular 
actin dynamics (Figure 2). The second is a putative 
CCAAT-binding transcription factor (PF14_0374). A Y2H 
assay [57] showed that it had PPIs with six proteins. Two 
of these proteins are likely involved in global transcription, 
including (a) a putative NOT1 protein (PF11_0049). Pro- 
teins in the NOT1 family were shown to regulate the 
activity of general transcription factor TFIID [88]; and (b) 
a conserved Plasmodium protein (PF14 0603) that has a 
functional domain RPC4 which comprises a subunit of 
the tRNA specific polymerase RNA Pol III. The third 
interacting protein is a merozoite surface protein 7 
(MSP7) precursor (PF13 0197), which is a regulator of 
parasite growth and a surface antigen regarded as a poten- 
tial vaccine target [89] . The fourth protein associated with 
PF14 0374 is a conserved Plasmodium membrane protein 
(PF14_0315) that contains two FYVE/PHD zinc fingers for 
binding to potential target molecules. The remaining two 
proteins associated with PF14 0374 are 40S ribosomal pro- 
teins S10 (PF07_0080) and S20e (PF10_0038), indicating 
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Figure 1 A graphical representation of the results of a Gene Ontology analysis done using BiNGO. The node size is proportional to the 
number of proteins represented by that GO term. The color represents the P-value for each enriched GO term as shown in the scale; white 
nodes are not enriched. The nodes are positioned to approximate their level in the Gene Ontology. 
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Table 1 Representative P. falciparum proteins that were predicted to be involved in transcriptional regulation 


Functional category 


PlasmoDB Accession Number 


Annotation 


General transcription regulators 


PH4_024l 


putative basic transcription factor 3b 




PH4_0374 


putative CCAAT-binding transcription factor 




nn a r\r r\c> 

PH4_0608 


putative YL1 nuclear protein 


a l— '. a no 

ApiAP2 


nn 1 ao r~. . , 

PhL1085w 


putative transcription factor with 1 AP2 domain 




Phi 1_0442 


putative transcription factor with 1 AP2 domain 




nn a aa~?a 

PH4_0079 


putative transcription factor with 1 AP2 domain 




nn 1 AAA 1 

Phi l_009l 


putative transcription factor with 1 AP2 domain 




nn a r\s~ 

PH4_0633 


putative transcription factor with 1 AP2 domain 




PhD0985w 


putative transcription factor with 2 AP2 domains 




nn 1 aaa, - , 

PhLI 900w 


putative transcription factor with 2 AP2 domains 




nr"A~7 a 1 o/ - 

Ph07_0l26 


putative transcription factor with 2 AP2 domains 




nnn ao /i a — 

Phh0840c 


putative transcription factor with 2 AP2 domains 




nn 11 r\ a r\ a 

Phi l_0404 


putative transcription factor with 3 AP2 domains 




nn a aa~7t 

Phi 0_0075 


putative transcription factor with 3 AP2 domains 


chromosome organization 


PhU0685c 


structural maintenance of chromosomes protein 3 homolog 




h a a i 1 o n 1 a/~ 

MALI 3Pl .96 


structural maintenance of chromosomes protein 2 


zinc finger proteins 


n n 1 a aaa 1 

Phi 0_009l 


putative zinc finger protein, C2H2 type 




n n i a a /~ t~ r~ 

PhL0465c 


zinc finger transcription factor (Kroxl), C2H2 type 




ft^Ai ~7 n 1 1 rr 

MAL7P1 .1 55 


putative zinc finger, C3HC4 type 






putative zinc linger, LjnL^ rype 




PF10_0186 


putative zinc finger C-x8-C-x5-C-x3-H type 




MAL7P1.68 


putative zinc finger protein, DHHC type 




PF14_0197 


putative zinc finger protein, DNL type 




PFD0970C 


putative zinc finger protein, CW type 




PF10_0143 


putative transcriptional activator ADA2 


Others 


PFE0870w 


putative transcriptional regulator 




PF14_0170 


putative NOT family protein 



the interactions between transcription and translation 
(Figure 2). 

A putative YL1 nuclear protein (PF14_0608) was pre- 
dicted to be a transcriptional regulator. It has two func- 
tional domains YL1 (Pfam accession PF05764) and YL1 
C-terminal domain (PF08265), both of which are typical 
DNA binding domains. This protein may be related to 
chromatin remodeling. In addition, a Y2H assay using 
this protein as a bait pulled out a chloroquine resistance 
marker protein (PF14_0463) (Figure 2). 

Apicomplexan-specific ApiAP2 transcriptional regulators 

Most interestingly, our subnetwork alignments also 
predicted 11 putative transcriptional regulators belong- 
ing to the Apicomplexan-specific ApiAP2 family. A 
characteristic feature of this family is the presence of 
the Apetala2 (AP2) domain. AP2 transcription factors 
play a pivotal role in floral development in plants [90]. 
The recent discovery of AP2 in the Apicomplexa, the 
phylum to which malaria parasites belong, suggested 



that the ApiAP2 proteins were derived from bacteria or 
the apicoplast progenitor via transponsons, followed by 
lineage-specific radiation [91]. These ApiAP2 proteins, 
in addition to regulating heterochromatin formation 
and genome integrity, may develop novel parasite- 
specific functions such as antigenic variation, invasion, 
and sporozoite development [92-95]. P. falciparum 
possesses 27 ApiAP2 members. Among the 11 ApiAP2 
proteins predicted by our network alignments, five con- 
tain a single AP2 domain, four contain two AP2 domains, 
and two contain three AP2 domains (Figure 3). Analyzing 
the protein-protein association data from the STRING 
database [4], in conjunction with the data from the Y2H 
assays, temporal microarray experiments, proteomics, and 
literature, revealed that these 11 ApiAP2 proteins are asso- 
ciated with 1-17 proteins in the cellular networks (Figure 4 
and Additional File 2). At least four ApiAP2 proteins 
(PF07_0126, PFD0985w, PF11_0404 and PF10_0075) have 
PPIs, suggesting that they play central role in transcrip- 
tional regulation. 
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Figure 2 A graph showing the proteins associated with three general transcriptional regulators. Square nodes represent the three 
transcriptional regulators. Node size is proportional to the degree of the node. Nodes are colored according to their functional classification in 
the eggNOG database [121]. The COG categories are [122] (J) Translation, ribosomal structure and biogenesis, (A) RNA processing and 
modification, (K) Transcription, (L) Replication, recombination and repair, (B) Chromatin structure and dynamics, (D) Cell cycle control, cell division, 
chromosome partitioning, (Y) Nuclear structure, (V) Defense mechanisms, (T) Signal transduction mechanisms, (M) Cell wall/membrane/envelope 
biogenesis, (N) Cell motility, (Z) Cytoskeleton, (W) Extracellular structures, (U) Intracellular trafficking, secretion, and vesicular transport, 
(0) Posttranslational modification, protein turnover, chaperones, (C) Energy production and conversion, (G) Carbohydrate transport and 
metabolism, (E) Amino acid transport and metabolism, (F) Nucleotide transport and metabolism, (H) Coenzyme transport and metabolism, (I) 
Lipid transport and metabolism, (P) Inorganic ion transport and metabolism, (Q) Secondary metabolites biosynthesis, transport and catabolism, 
(R) General function prediction only, and (S) Function unknown. Confidence scores for the interactions among the nodes (S values from STRING) 
were divided into three groups - low (0.150-0.399), medium (0.400-0.700) and high (0.701-0.999); the groups are represented by thin, medium 
and heavy lines, respectively. 



The ApiAP2 protein with highest connectivity is 
PFD0985w, which has 17 interaction partners (Figure 4). It 
has direct physical interactions with two other ApiAP2 
proteins (PF07_0126 and MAL8P1.153). It is associated 
with a nucleosome assembly protein (PFI0930c) that is 
implicated in chromatin remodeling, and a putative Ndc80 
homolog (PFF0785w) that may be a component of the 
mitotic spindle related to chromosome segregation. It is 
also associated with three surface antigens including a reti- 
culocyte binding protein 2 homologue a (PF130198) 



which may play a role in determining host-cell invasion 
specificity [96], an antigen 332 (PF11_0506) in the Duffy 
binding-like (DBL) protein family which may be related 
to parasite entry to the host, and an asparagine-rich anti- 
gen (PF08_0060). This ApiAP2 protein PFD0985w also 
appeared to be related to a number of secreted proteins 
including a putative secreted ookinete protein (PFA0430c), 
and two proteins that are associated with Maurer's clefts 
[97], parasite-derived membranous structures within the 
host cell cytoplasm [PfSec31(PFB0640c), which is a COPII- 
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Figure 3 Phylogenetic tree of the ApiAP2 transcriptional regulator family in P. falciparum The tree was constructed using the neighbor- 
joining method [120]. 1 1 out of the 27 members were predicted by the subnetwork alignment algorithm. •: ApiAP2 protein with 1 AP2 domain 
A : ApiAP2 protein with 2 AP2 domains; ■: ApiAP2 protein with 3 AP2 domains. 



coated vesicle component and PHISTb (PFD0080c)]. In 
addition, PFD0985w has direct PPIs with the 26S protea- 
some AAA-ATPase subunit RPT3 (PFD0665c), which is a 
component in ubiquitin-proteasome system for protein 
degradation, and pyruvate kinase (PFF1300w), an essential 
enzyme for glycolysis. 

The ApiAP2 protein with second largest connectivity is 
PF07_0126. It has 15 PPI partners (Figure 4) that can be 
divided into five categories: (1) transcriptional regulation. It 



is associated with two otherApiAP2 proteins (PFD0985w 
and PFF0200c), and a CCAAT-box DNA binding protein 
subunit B (PF11 0477); (2) epigenetic regulation. It is asso- 
ciated with PfHMGB2 (MAL8P1.72), which has a DNA- 
binding domain: HMG-box (High Mobility Group box). 
The proteins in this family have been implicated in regula- 
tion of transcription, replication, repair, and chromatin 
remodeling; (3) signaling. PF07 0126 is associated with at 
least three putative signaling proteins, including (a) 
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Figure 4 A graph showing the proteins associated with 1 1 predicted ApiAP2 transcriptional regulators Square nodes represent ApiAP2s. 
Node size is proportional to the degree of the node. Nodes are colored according to their functional classification in the eggNOG database 
[121]. The visualization is as for Figure 2. 



PF13 0042, which contains a forkhead-associated domain 
that is found in a variety of regulatory proteins involved in 
signaling, (b) a calcium/calmodulin-dependent protein 
kinase (PF11 0060) that is implicated in signaling cascades, 
(c) a putative 14-3-3 protein (MAL8P1.69). Proteins in 14- 
3-3 family include regulatory ligands to various signaling 
molecules such as kinases and receptors; (4) surface 
antigens for cell adhesion and entry to the host. PF070126 
is associated with a Duffy binding-like antigen 332 
(PF11 0506), an erythrocyte membrane-associated antigen 
(PFD1045c), and a QF122 antigen (PF10_0115) with an 
RNA-binding motif; (5) metabolism. The glycolytic enzyme 
fructose-bisphosphate aldolase (PF14 0425) is associated 
with the ApiAP2 protein PF07_0126. 

The role of ApiAP2 proteins in transcriptional and epi- 
genetic regulation is also indicated by a direct PPI between 
a putative ApiAP2 with 3 AP2 domains PF10_0075 and a 
histone acetyltransferase GCN5 (PF08_0034), an enzyme 
for histone modification and chromatin remodeling [98]. 
This ApiAP2 protein may also been involved in the regula- 
tion of genome integrity through a PPI with a DNA repair 
protein rhpl6 (PFL2440w), and cytoskeleton organization 
of actin (Figure 4). 

Two of these 11 ApiAP2 proteins have been experimen- 
tally characterized to some extent: (1) the crystal structure 
of the AP2 domain of PF14 0633 has been determined, 
revealing a multiple-site binding pattern [99], and gene 



disruption assays showed that its ortholog in the rodent 
parasite P. berghei was an indispensible regulator for spor- 
ozoite development in the mosquito stage [94] . However, 
its regulatory roles and targets remain uncharacterized in 
P. falciparum. As shown in Figure 4, it has only two direct 
PPIs revealed by Y2H assays [57]: the first is a ribosomal 
protein P0, and the second protein PTEX150 (PF14_0344) 
is an important component in a translocon of exported 
proteins (PTEX). Located in the vacuole membrane, PTEX 
was recently discovered as a novel ATP-dependent protein 
trafficking machinery [100]. Notably, PTEX150 is only pre- 
sent in the Plasmodium genus. The PPI between PTEX150 
and ApiAP2 suggests that this export machinery may have 
parasite-specific regulation. PTEX is becoming an attrac- 
tive therapeutic target due to its importance to virulence 
and parasite survival and its distant evolutionary related- 
ness to the human host. (2) PF11 0442. Its counterpart in 
P. berghei is a transcription factor that regulates ookinete- 
specific gene expression for parasite invasion of the mos- 
quito midgut. PF11 0442, however, may play a role in the 
red blood cell (RBC) stage: It has one PPI partner, rhop- 
try-associated protein 1 (RAP1, PF14_0102). RAP1 is an 
escort protein required to localize RAP2 to the rhoptries, 
apical organelles essential for RBC invasion [101]. 

In summary, ApiAP2 proteins are a family of stage- 
specific transcriptional regulators for diverse processes 
ranging from epigenetic modification, chromosome 
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organization and dynamics, invasion, protein sorting and 
trafficking, protein turnover, and metabolism. 

Other potentially important proteins that may be 
involved in transcriptional regulation 

Module based subnetwork alignments predicted addi- 
tional proteins that are likely involved in transcriptional 
regulation (Table 1). Two proteins (PFD0685c and 
MAL13P1.96) are members of the SMC (structural 
maintenance of chromosomes) superfamily; they both 
have a RecF/RecN/SMC N terminal domain and may be 
involved in chromatin cohesion and dynamics. A num- 
bers of zinc-finger proteins were identified by network 
alignments as well. They exhibit different types of 
domain configurations, including the classical DNA- 
binding motif C2H2 found in transcription factors, the 
C3HC4 type (RING finger) typically found in proteins 
mediating ubiquitination, the C-x8-C-x5-C-x3-H (CCCH) 
type implicated in cell cycle regulation, the DHHC type 
found in proteins important for membrane association 
and trafficking, the DNL type implicated in protein trans- 
location into mitochondria, and the CW type related to 
DNA-binding and protein-protein interaction. A putative 
transcriptional coactivator (ADA2, PF10_0143) has a ZZ- 
type zinc finger domain. ADA2 was shown, in baker's 
yeast and Arabidopsis thaliana, to physically interact with 
GCN5, a histone acetyltransferase and a potent transcrip- 
tional activator [102,103]. The Y2H assay in P. falciparum 
[57] revealed that ADA2 has direct physical interactions 
with proteins including a minichromosome maintenance 
(MCM) complex subunit (PF14_0177), a pre-mRNA 
splicing factor (PFD0265w), a heat shock protein hsp70 
interacting protein (PFE1370w), a sodium-dependent 
phosphate transporter (MAL13P1.206), a serine/threonine 
protein kinase in the FIKK family (PFA0130c), cathepsin C 
(PF11_0174), and a mature parasite-infected erythrocyte 
surface antigen (PFE0040c), suggesting its potential versa- 
tile roles in DNA replication, splicing, transport, protein 
processing, signal transduction, and invasion. 

Other putative transcriptional regulators include 
PFE0870 and PF14_0170. PFE0870 contains two functional 
domains: a FACT complex subunit (SPT16/CDC68) 
domain which was reported to facilitate transcriptional 
initiation and interact with nucleosomes and histones 
[104], and a histone chaperone Rttpl06-like domain. This 
protein may be involved in heterochromatin silencing and 
epigenetic regulation. PF14_0170 is a putative protein in 
the NOT global transcriptional regulator family. Y2H 
assays showed that it had direct physical interactions with 
CCAAT-box DNA binding protein subunit B (PF11_0477), 
DNA topoisomerase II (PF14 0316), and calcium depen- 
dent protein kinase 1 (PFB0815w), emphasizing its involve- 
ment in general transcriptional control and chromosome 
topology and signaling processes. It also has a PPI with a 



Pfll-1 protein (PF10_0374), which may play a role in pro- 
tein trafficking processes associated with Maurer's cleft. 

Conclusions 

A functional-module based alignment approach was used 
to predict system components in transcriptional regula- 
tory networks in malaria parasite P. falciparum. Our 
results predicted general transcriptional regulators that 
may regulate gene expression in a global or pleiotropic 
mode. Our results also imply a group of parasite-specific 
transcriptional regulators in the ApiAP2 family that play 
roles in diverse cellular processes ranging from chroma- 
tin remodeling, protein sorting and secretion, signal 
transduction, and invasion. Finally, our analysis has iden- 
tified other potentially important proteins involved in 
transcriptional regulation. Our present knowledge of the 
transcriptional machinery and its regulatory capacity is 
rudimentary. The identification of network components 
in this machinery will open new avenues to the develop- 
ment of novel therapeutic targets and provide new 
insights into parasite biology, pathogenesis and virulence. 

The premise of our subnetwork alignment approach is 
that functional annotations of the proteins can be trans- 
ferred across species through conserved interactions in the 
aligned PPI networks. Under this framework, a priori 
information as to the identity or function of a gene is not 
necessary for the gene to be placed in a network. Thus 
genes identified only because of their key role in a network 
become potential targets. Furthermore, placement of the 
gene product in a systems context could, in itself, serve to 
identify the function of the gene product. If successfully 
applied, a systems biology approach circumvents the limit- 
ing factor in comparative genomics - the difficulty in 
obtaining reliable functional assignments. 

Methods 

Ortholog prediction by subnetwork querying 

To predict functional orthologs for P. falciparum proteins, 
we formulated the problem as subnetwork querying. We 
first mapped the annotated E. coli transcriptional factors 
(GO:0003700: transcription factor activity) into the E. coli 
protein-protein interaction network. For each transcrip- 
tional factor, nearby neighbors were selected to form its 
neighborhood subnetwork. Similarly, each P. falciparum 
protein was mapped into the P. falciparum PPI network 
and a neighborhood subnetwork was built to include 
its nearby neighbors. Since the E. coli network and the 
P. falciparum network differ in size and density, the 
nearby neighbors were selected with a rule to control the 
neighborhood size. Let N k (p) denote the set of proteins 
that are exactly k hops from the central protein p. The 
neighborhood of central protein p is N(p) = N-iip) U N 2 (p) 
... U N k {p) such that \N(p)\ < 500. Specifically, we first 
included the neighboring proteins that are 1 hop from the 
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central protein. If the size of the neighborhood was less 
than 500, we continued to include the proteins that were 2 
hops from the central protein. We kept increasing the hop 
distance until the neighborhood size was larger than 500. 
In other words, nearby proteins were selected by their dis- 
tance to the central protein and the neighborhood size 
was kept below 500 unless the central protein has more 
than 500 direct neighbors in the PPI network. 

After we obtained the neighborhood subnetwork for 
the E. coli transcriptional factors and all the P. falci- 
parum proteins, we aligned each E. coli subnetwork 
against all the P. falciparum subnetworks. The central 
protein of the best-aligned P. falciparum subnetwork 
was identified as the functional ortholog of the E. coli 
transcriptional factor. 

Aligning neighborhood subnetworks with graph kernel 

To evaluate how well a P. falciparum neighborhood sub- 
network aligned with an E. coli neighborhood subnetwork, 
we assigned a score for each possible alignment and sum- 
marized the alignment scores with a graph kernel. Graph 
kernels are effective approaches to measure the similarity 
between two labeled networks [105,106]. Given a pair of 
labeled graphs, a graph kernel is designed to summarize 
all possible isomorphic subgraphs (exact matches) in the 
two graphs. However, since there are an exponential num- 
ber of subgraphs, it is computationally infeasible to detect 
all isomorphic subgraphs. A simplification is to compute 
the number of common paths between two graphs by a 
random walk on a product graph of the two compared 
graphs or by dynamic programming [107-109]. Alterna- 
tively, a graph kernel can also explicitly summarize the 



similarity between the shortest paths in the two graphs 
with each pair of shortest paths measured by a convolu- 
tion kernel [110]. Since our focus is only on the paths that 
go through the central protein, we modified the shortest 
path graph kernel to only consider the paths between the 
central protein and the other proteins in the subnetwork. 
The underlying hypothesis is that each shortest path going 
through the central protein can characterize the functional 
role of the protein in the chained molecular activities 
along the path. As shown in Figure 5, given two subnet- 
works S p with central protein p and S q with central protein 
q, we define a simple shortest path similarity function, 



K(S q ,S p ) 



where, 



1 



\ S l\ + \ S P\ V(il,i2)eS, 



[] B((il,i2),Sp) 



. , 2E(il,jl)E(i2,j2) 

B ((il, i2) , S p ) = max ^ — ; v . ' . 

v(;i,j2)es p dist (il, il) + dist (jl, jl) 



, Eval(x,y)^ ... ... 

E(x,y) = exp( -J with the normalization para- 

er 

meter a = 10 measures the sequence similarity between 
proteins x and y based on the E-value of the sequence 
alignment, and dist(x, y) is the length of the shortest path 
connecting proteins x and y in the PPI subnetwork. Since 
the scores were small numbers, the computation was done 
in -logio scale. In this similarity function, we took each pair 
of the proteins (il, i2) in one subnetwork and identify the 
(jl, j2) in the other subnetwork that gives the maximum 
ratio between their sequence similarity with respect to {il, 




Figure 5 Computation of subnetwork alignment score. The alignment score between subnetwork 5 P and 5 q is the summation of the 

similarity score between all pairs of matched shortest paths ((i1, i2) and (j1, j2) in the figure), calculated based on the sequence similarities (E(i1, 

jl) and E(i2, j2)) and the distances in the subnetworks (dist(i1, i2) and distfjl, j2)). 
^ ) 
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z'2) and the closeness in the subnetworks. Specifically, we 
computed the shortest path through the central protein 
between all pairs of proteins in the neighborhood subnet- 
work. The shortest paths of two neighborhood subnet- 
works are then compared and scored pairwise. The total of 
the alignment scores was reported as the subnetwork align- 
ment score. Our strategy is to incorporate both the 
sequence similarity of the proteins and the role of the cen- 
tral proteins in the subnetwork in the similarity measure, 
which summarizes the functional coherence between the 
two subnetworks and between the two central proteins of 
the two subnetworks. 

Network data and analysis 

The E. coli protein-protein interactions were obtained 
from IntAct database. IntAct database provides binary pro- 
tein-protein interactions derived from literature curation 
or direct user submissions. The complete set of protein- 
protein associations for P. falciparum was extracted from 
the STRING database [111]; each association between a 
pair of proteins has a confidence score (S) ranging from 
0.15 to 0.999, based on the evidence from sequence simi- 
larity comparison, pathway (KEGG [112] and PlasmoCyc 
[52]) assignments, genome neighborhood analysis, phylo- 
genetic inference, and literature co-occurrence. The asso- 
ciations were visualized in Cytoscape [113] and converted 
to an undirected weighted graph, where there is a single 
edge between any pair of proteins and the S value is used 
as the weight. The network was characterized using Net- 
workAnalyzer [114]. The default values were used for all 
three plugins. The set of proteins associated with tran- 
scriptional regulation were screened using BiNGO [115] 
to determine if any categories of proteins, as identified by 
their Gene Ontology terms, were enriched. The hypergeo- 
metric test was used with the Benjamini and Hochberg 
false discovery date correction. A significance level of 0.05 
was selected. 

The omics data mining 

P. falciparum genomic sequence and annotation data [5], 
transcriptomic microarray data [7,9,12], mass-spectrome- 
try proteomic data [34,35,39,40], and protein-protein 
interactome [57] data for network associated proteins 
were downloaded from PlasmoDB (http://www.plas- 
modb.org) [116]. Conserved domains/motifs were identi- 
fied by searching InterPro [117]. Multiple alignments 
were obtained using the ClustalX program [118] and T- 
coffee [119], followed by manual inspection and editing. 
Phylogenetic trees were inferred by the neighbor-joining 
method implemented in MEGA5 [120]. Bootstrap resam- 
pling with 1,000 replicates was carried out to assess sup- 
port for individual branches. Bootstrap values of < 50% 
were collapsed and treated as polytomies. 
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EBA: erythrocyte binding antigen; DBL: Duffy binding-like; GO: Gene 
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